/*
 This program creates txt files based on the raw data used as input in R to do 
 the text analysis.
 
 Note that there are a few manual clean-ups of the txt files -- described at 
 the end of this program -- implemented before running the R code.
 
 Data from R is loaded back into Stata in pt. 2 of this program to generate 
 respondent-level word analyses and word tables (Table A-2, A-3, A-4 of the appendix) 
*/

clear all
set more off

cd "$localdir/Data"

*** Read survey data and save a text file with the relevant question
* Question q63: Issues facing MSA

use "STAN0107_OUTPUT_8msas.dta", clear
g count=_n
keep Q63 
export delimited "q63.txt", replace


*** Manually clean text the txt file before running the R code
* The "'" symbol - for instance used in "don't" - show up as "í" or "ì" or "î". Correct this with "find and replace" in the text file.
* There is one "ñ" in the text, this is replaced with "and" in the text file. 
* There is one sentence with three "ó" symbols, these are replaced with blank space " ". 
* There is one "·" in a Spanish sentence, this is deleted.

*** A new txt file is saved as _clean.txt after the clean-up












